AITopics | generalized autoregressive pretraining

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsDec-26-2025, 01:27:49 GMT

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.

generalized autoregressive pretraining, name change, xlnet, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsMay-27-2025, 16:42:20 GMT

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Empirically, under comparable experiment setting, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

bidirectional context, generalized autoregressive pretraining, xlnet, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsJan-27-2025, 13:09:13 GMT

Originality: The architecture is novel compare to recent lines of language model work, which all used variation of BERT or GPT (SciBERT, MT-DNN, MASS and etc). The example ("New York is a city" one) makes sense, but considering the permutation is random when computing the objective function, I still couldn't get why it works better than sequential order because human speaks/writes in sequential order. Could you add more intuitions in paper? Or have you tried predicting n-gram, compare to permutation? Quality: Very high considering they did extensive of studies on multiple benchmarks, also the ablation study is nicely done as well.

ablation study, generalized autoregressive pretraining, sequential order, (2 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsJan-27-2025, 13:09:03 GMT

The paper proposes XLNet, a generalized autoregressive pretraining method for language representation learning. The paper shows that XLNet outperforms the state of the art method of BERT on 12 tasks. The paper is of high quality in terms of clarity, technical soundness, significance, and novelty. The authors successfully addressed the issues pointed out by the reviewers. The reviewers are very satisfied with the response.

generalized autoregressive pretraining, reviewer, xlnet

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsOct-11-2024, 02:11:24 GMT

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Empirically, under comparable experiment setting, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

bidirectional context, generalized autoregressive pretraining, xlnet, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Yang, Zhilin, Dai, Zihang, Yang, Yiming, Carbonell, Jaime, Salakhutdinov, Russ R., Le, Quoc V.

Neural Information Processing SystemsMar-18-2020, 22:48:15 GMT

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Empirically, under comparable experiment setting, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking. Papers published at the Neural Information Processing Systems Conference.

bidirectional context, generalized autoregressive pretraining, xlnet, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

Filters

Collaborating Authors

generalized autoregressive pretraining

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding